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Abstract 

On computers, discrete problems are solved instead of continuous ones. 
One must be sure that the solutions of the former problems, obtained in real 
time (i.e., when the stepsize h is not infinitesimal) are good approximations 
of the solutions of the latter ones. However, since the discrete world is much 
richer than the continuous one (the latter being a limit case of the former), 
the classical definitions and techniques, devised to analyze the behaviors of 
continuous problems, are often insufficient to handle the discrete case, and new 
specific tools are needed. Often, the insistence in following a path already 
traced in the continuous setting, has caused waste of time and efforts, whereas 
new specific tools have solved the problems both more easily and elegantly. 

In this paper we survey three of the main difficulties encountered in the 
numerical solutions of ODEs, along with the novel solutions proposed. 
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1 Introduction 

When things get too complicated, it sometimes makes sense 
to stop and wonder: Have I asked the right question? 
Enrico Bombieri, from 'Prime Territory' in The Sciences. 
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The need for highly efficient numerical methods able to solve the challenging 
multiscale problems arising from countless and wide-spread applications is well known 
(see, e.g. [52]). The core of a simulating tool for differential models consists of one 
or more numerical methods for solving Ordinary Differential Equations (ODEs). 

In this paper we survey three of the main difficulties encountered in this field along 
with the surprising solutions proposed, often different from what both experience and 
tradition were suggesting. As a matter of facts, both tradition and experience in 
Mathematics are mainly focused to continuous quantities, while Numerical Analysis 
is obliged to face discrete quantities. 

The modern numerical treatment of ODEs starts with the introduction of com- 
puters. The approaches of the pre-computer and the post-computer era are quite 
different. While in the first case the central concept was the notion of convergence 
with respect to the annihilation of a parameter representing the stepsize of integration, 
in the second case the central concept has become the notion of stability, although 
often in the most elementary form. The reason of this change is due to the fact that 
there are convergent methods which give bad results even for very small values of 
the stepsize. This was not clear when the computations had to be made by hand 
and, therefore, only in limited quantity. Once computers allowed the execution of a 
huge number of operations within a small amount of time, the problem arose in all 
its gravity. 

This question was largely debated in the fifties and sixties. It is a great merit 
of G. Dahlquist to recognize that the difference equations describing the methods 
should inherit from the continuous problem not only the critical points but also 
the geometry around them. Although many authors date the birth of the so called 
geometric integration to a later period, it is our opinion that it has to coincide with 
the mentioned Dahlquist request. 

Actually, at that time, the main concern was about dissipative problems whose 
critical points are obviously asymptotically stable. Since for such problems the lin- 
earization in a neighborhood of the critical point very well describes the geometry 
around it, the use of the famous test equation (y' = Xy, 9?(A) < 0) is then justified 
and this gave rise to the so called linear stability analysis. 

The main result of such analysis is the definition of the region of absolute stability, 
which is the region of the g-plane (q = hX) in correspondence of which the critical 
point (the null solution) is asymptotically stable for the difference equation obtained 
by applying the numerical method to the above mentioned test problem. Of course, 
one may wish to obtain absolute stability regions no smaller than the stability region 
of the continuous problem (9R(A) < 0) and this led to the definition of A-stable 
methods. 

This request very soon turned out to be too restrictive, at least for the class of 
Linear Multistep Methods (LMMs), which the negative result of the Dahlquist barrier 
refers to: there are no A-stable explicit methods, and among the implicit ones, the 
best is the trapezoidal rule, which is only second order. 
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This shortage of A-stable methods explains the period of crisis in the use of LMMs 
and the upper hand of one-step methods (Runge-Kutta) over them. Eventually, in 
the nineties, the way to obtain A-stable LMMs was found and by now a very large 
amount of them are at our disposal. The idea to get them, already proposed by 
Dahlquist [22J P- 378], is very simple and it will be our first topic. 

Do all ODEs need large absolute stability regions? of course not. But difficult 
problems, called stiff problems, do. What are stiff problems? A definition, precise 
enough to be used in modern general-purpose codes, has been lacking for many yearsQ 
As matter of fact, a proper definition of stiffness exists since 1996 and it has also been 
usefully used in many modern codes. This relies upon a simple idea which will be our 
second topic. 

What about conservative problems? Here the geometrical properties the numer- 
ical method has to inherit are more difficult to establish, since they are much more 
perturbation dependent than dissipative systems. In fact, after Poincare (see, for 
example, [521 12]), it is well known that linearization does not help in this case, the 
literature being plenty of examples of ODEs sharing the same linear part, with the 
geometry around the stable (but not asymptotically stable) critical points changing 
drastically according to the nonlinear part. This implies that a linear stability anal- 
ysis on the method does not make sense, unless one is interested in solving linear 
problems. 

There are, however, other peculiarities which could be desirably inherited by the 
numerical method. For example, in a Hamiltonian problem describing the motion 
of an isolated mechanical system, one may ask to preserve a number of constants of 
motion, such as the Hamiltonian function, the angular momentum, etc.. Although 
the question has been under study for more then thirty years, no really useful steps 
forward have been made until recently. Usually, people have tried to mimic what 
physicists have done in the continuous case without impressive results, in spite of the 
great amount of work. For example, very sophisticated tools like backward analysis 
and KAM-theory have been considered to examine the long time behavior of the 
numerical solution produced by symplectic and symmetric methods. 

Nonetheless, there are a few exceptions where completely new strategies have been 
considered. One of these is represented by discrete gradient methods, which are based 
upon the definition of a discrete counterpart of the gradient operator [23 EH]- Such 
methods guarantee the energy conservation of the numerical solution whatever the 
choice of the stepsize of integration. At present, methods in this class are known of 
order at most two. More recently, a completely new approach has been developed 
that has allowed the definition of a very wide class of methods of any high order, 
suitable for the integration of Hamiltonian problems (51 El El El EE]: this will be our 
third topic. 

This paper contains three main sections, one for each theme described above. 
Each section will contain a number of subsections, describing the problem along with 

1 Even now, one can read, for example on Scholarpedia [30], that such a definition is not possible. 
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the principal attempts devised to solve it, and the idea that has inspired the new 
approach which, often, turns out to be completely different from the previous ones. 

We shall intentionally devote a larger part of the paper to discuss the last topic 
(i.e., the numerical integration of Hamiltonian problems), because of two reasons: 

• the presented results are relatively new; 

• Hamiltonian problems assume a paramount importance in modeling problems 
of Celestial Mechanics. 

2 A-stable Linear Multistep Methods 

In order to set the problem in its historical background, let us report what Hindmarsh, 
one of the leading experts, wrote in a famous paper [34J : 

As recently as 1960, the commonly held perception on ordinary differential 
equations in practical applications was that almost all of them could be 
solved with simple numerical methods widely available in textbooks. Many 
still hold that perception, but it has become more and more widely realized, 
by people in a variety of disciplines, that this is far to be true. 

Multiscale problems (in this setting called stiff problems, see the next section) 
arising in a wide spectrum of applications, very soon made the known methods, based 
on the concept of convergence for h approaching zero, inadequate. The new idea was 
to design methods working for finite values of h, at least for dissipative problems. 
This led to the already mentioned definition of A-stability. But the Dahlquist barrier 
established that the class of LMMs having such property is very scanty, or even empty 
if referred to the class of methods of order greater than 2. This situation lasted for at 
least thirty years. But it was again Dahlquist who had foreseen the possible solution, 
as clearly stated in [22j p. 378], regarding fc-step LMMs: 

. . . when k > 1, the k — 1 extra conditions to define a solution of the 
difference equation, need not to be initial values. One can also formulate 
a boundary-value problem for the differential equation. The boundary- 
value problem can be stable for a difference equation even though the root 
condition is not satisfied. This has been pointed out in an important article 
by J. CP. Miller. 

2.1 The new idea: Miller's algorithm 

Why did the above precognition remain sterile for many years? The successes in this 
direction obtained within the flourishing class of Runge-Kutta methods somehow 
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determined an unfavorable climate for Miller's idea to inspire a systematic analysis. 
But, in our opinion, this is not the whole story. It was also due to the concept of 
stability in Numerical Analysis, which was very vague at that time. The most rigorous 
definition of such concept was related to linear difference equations coupled with all 
initial conditions (initial value problems) H It was necessary to refine the concept of 
stability in the quoted Dahlquist phrase. We shall see in a moment that the stability 
in Miller's algorithm is in contradiction with the stability used in most numerical 
analysis books and reported in the footnote. In order to be clearer, let us describe 
Miller's algorithm in its easiest form. 

Consider the linear difference equation with initial values 

y n+2 = 100.5y n+ i - 50y n , n>0, y = V3; y\ = .5>/3. (1) 

Its solution is y n = 2~ n \/3. However, when using finite precision arithmetic, every 
computer, no matter how powerful, will fail to compute iteratively the solution of 
([I])Jj despite its simplicity. The algorithm designed by Miller, is able to find a very 
good approximation of the solution even on the poorest computer. It works as follows. 
Suppose we are interested in the solution for n < 10: just replace the second condition 
by yxo = 0. This transforms the original initial value problem (IVP) in a boundary 
value problem (BVP). In Figure HJ three solutions are represented: two of them, the 
BVP solution and the true one, are almost indistinguishable; the third one, i.e., that 
obtained iteratively, accumulates errors and very soon becomes negative]^ 

Even using the term "stability" in the vague definition often used in Numerical 
Analysis, it is evident that the BVP solution is much more stable than the IVP 
solution. To be more precise, the characteristic polynomial of (CD) has one root inside 
the unit circle and one outside. It cannot be considered stable according to the 
definition reported in Footnote [2j But why does it turn out to be stable for the 
BVP? The answer is simple: the definition of stability only concerns IVPs. For BVPs 
one must use the more general concept of conditioning. We do not report here the 
details (see, e.g. [38]). Once this new and more precise concept is used, then all 
the above questions become clear. Coming back to the above example, it turns out 
that a difference equation with one root inside and the other outside the unit circle 
is well conditioned if the two conditions that guarantee the uniqueness of its solution 
are placed one at the initial point and the other at the final point. This is enough 
not only to explain the result of the example, but also to explain, for example, the 
poor performances of the celebrated shooting method for solving continuous BVPs 
(see pj). 

2 With very few exceptions, in Numerical Analysis books, stability is equivalent to the so called 
root condition, which requires that all the roots of the characteristic polynomial of the difference 
equation obtained by applying the LMM to the test equation y' = (or to y' = Ay) lie inside the 
unit circle of the complex plane or on its boundary, in which case they must be simple. 

3 If, accidentally, a computer does the job, just replace the couple of coefficients (100.5, 50) by 
(1000.5, 500). The solution remains the same. 

4 If instead of n = 10 one takes n = 20, the first two solutions remain positive and indistinguish- 
able, while the third one reaches the impressive value — 10 20 . 
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Figure 1: Solution of problem (pQ) computed by Miller's algorithm. 

Miller's algorithm was deeply studied later on (see, e.g. [26j [16j EB EH HSJ H7] ) and 
eventually applied to design methods for ODEs, according to Dahlquist's suggestion. 
For details, see [14] and references therein. 

2.2 Abundance of ^-stable methods 

Once the vague concept of stability in Dahlquist's proposal became more precise with 
the introduction of the conditioning notion, the definition of a great variety of bi- 
stable LMMs followed. Of course, the definition of absolute stability, which requires 
that the asymptotic behaviour of the numerical solution be the same as that of the 
theoretical one, in relation to the test equation y' = Xy, 9ft(A) < 0, needed to be 
generalized accordingly. 

Definition 1 A k-step LMM coupled with h\ initial conditions and k 2 final conditions 
(k = k\ + k 2 ), is absolutely stable at q = hX, if the characteristic polynomial has k± 
roots inside the unit circle and k 2 roots outside^ 

In other words, the requirement that all the roots of the characteristic polynomial 
of the difference equation defining the methods must be inside the unit circle was too 
much restrictive. The freedom to have some of them outside the unit circle opens 

5 Note that the definition reduces to the old one when k% — 0, i.e. for LMMs coupled with only 
initial conditions. 
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wide possibilities. The only cost to pay is to shift some additional conditions to the 
end of the interval. Such methods have been called Boundary Value Methods (BVMs) 
(see, e.g. [H]). We are now plenty of A-stable linear multistep methods: even the 
methods with the highest possible order with respect to the given number of steps 
(Top Order Methods) are A-stable. 

The above result has made it possible to write efficient codes for stiff (multiscale) 
problems based on LMMs (the codes GAM, GAMD, GAMP [361 EZ] BIM, BIMD 
[H |5], and TOM. See also the Test Set for IVP Solvers [48J for benchmarks of the 
most popular codes for ODEs). 



Stiffness is a mathematical term to denote multiscale problems. At least, this was 
the first meaning of the term, although during the years the terms has been used in 
a great variety of meanings (for a historical account see [11J). A common way this 
term is perceived currently, is the following: a stiff problem is a problem for which 
explicit methods do not work. It is glaringly evident the inadequacy of such definition 
both for the mathematical needs and for logical reasons. 

Despite the mentioned variety of definitions, the design of powerful codes needs a 
very precise definition of stiff problems, in order to be able to automatically recognize 
them and choose the appropriate strategies (appropriate methods, mesh selections, 
etc.). The situation assumes a paradoxical aspect: from the one hand some experts 
claim the impossibility of giving the needed precise definition of stiffness, and from 
the other, such definition does exist and is also published in a book [TJ] pp.237 ff.]. 
The reason of such reluctance in accepting the mentioned precise definition stays, 
in our optimistic opinion, in the fact that the definition is based on a very simple 
idea, as we shall see soon. Fortunately, things are changing in recent years since the 
new definition has been used to improve the performance of some celebrated codes 

[13 EH EH [19]. 

3.1 The simple new idea 

The two plots in Figure |2] report two functions: one constant and the other rapidly 
variable. How is it possible to distinguish their behaviors without looking at them? 
Simply, just compute the areas under their graphs and compare the results. Let T 
be the interval, for technical reasons we normalize the two areas by dividing them by 
T. We get, respectively, 



3 Stiffness 
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Constant solution 



Rapidly varying solution 
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Figure 2: Functions with different behaviors: constant (left plot) and rapidly varying 
(right plot). 

Suppose that y(t) = e xt y with A negative. We get: 
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The ratio 

k c T 

U C — 1 

is the ratio of two times: the integration interval (T) and the decaying time (rj^)- 
Of course, the important case is the one in which the two types of solutions occur 
in the same problem. The quantity a c measures the stiffness (or the multiscalar- 
ity) of the problem. The simple definition given above has been generalized to more 
general differential problems, autonomous and non autonomous dissipative problems, 
conservative problems, etc. (see [TTJ [T2J dSl EEH EH])- Of course the extension of the 
definition to more general problems needs the introduction of more involved techni- 
calities, but the leading idea remains unchanged: the definition of stiffness needs two 
measures, such as the infinity norm (k c ) and the L\ norm (7J, and the ratio between 
the two@ 

The above definition of stiffness deals with the continuous case. Of course, a 
very similar definition can be introduced for discrete problems, thus leading to the 
corresponding parameters n^, Jd, &d- The two set of parameters (called conditioning 
parameters) , turn out to be useful also to tell if the discrete problem is appropriate 
for approximating the corresponding continuous one. In fact one has: 

6 Actually, the maximum over an appropriate set of ratios. 
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Definition 2 A continuous problem is well represented by a discrete one if 
and 7 C ~ j d . 

The use of the above precise definition, based on computable conditioning parameters, 
has allowed: 

- to automatically recognize stiff problems; 

- to define efficient variable mesh selection strategies. 

As was pointed out above, some existing codes and new ones take advantage from 
the use of these conditioning parameters [151 [T7t fT8^ [T9| [2(7| [^3t [^5], 

4 Hamiltonian Problems 

Hamiltonian problems form a subclass of conservative problems. They assume the 
form 



where y = {q T ,p T ) T , q,p G and H is a sufficiently smooth scalar function. 

As was observed in the introduction, the main difficulty in dealing with them 
numerically stems from the fact that the meaningful isolated critical points of such 
systems are only marginally stable: neighboring solution curves do not eventually 
approach the equilibrium point either in future or in past times. 

This implies that the geometry around them critically depends on perturbations 
of the linear part. Consequently, the use of a linear test equation, which essentially 
captures the geometry of the linear part, whose utility has been enormous in settling 
the dissipative case, cannot be of any utility in the present case. 

It is then natural to look for other properties of Hamiltonian systems that can be 
imposed on the discrete methods in order to make them efficient. The first property 
which comes to mind is the symplecticity of the flow (p t := y i— >■ y(t) associated with 
©• This property can be described either in geometric form (invariance of areas, 
volumes, etc.) or in analytical form: 



In one way or the other, it essentially consists in moving infinitesimally on the tra- 
jectories representing the solutions. Infinitesimally means retaining only the linear 
part of the infinitesimal time displacement St. It can be shown that this produces 
new values of the variables q + 5q, p + 5p which leave unchanged the value of the 
Hamiltonian H(q + 5q,p + 5p) = H(q,p) (Infinitesimal Contact Transformation (ICT, 
see [2H p. 386])). 




dy 
dt 




(3) 
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Consequently, since the composition of two or more such infinitesimal transfor- 
mations maintain the invariance, so does an infinite number of them. The beauty 
of such result in the continuous case has perhaps created the expectation that similar 
beautiful achievements could be obtained in the discrete case. It is not surprising that 
the first attempts, to numerically approach the problem, aimed at devising symplec- 
tic integrators ([551 Ruth (1983)] , [251 Feng Kang (1985)]; see also the monographs 
[561 W2\ [33] for more details on the subject). 

Although in particular circumstances, for example in the case of quadratic Hamil- 
tonians or, more in general, in the case of Hamiltonian systems admitting quadratic 
first integrals, this approach has provided very good resultsjll it cannot be considered 
conclusive in treating general Hamiltonian problems. 

A backward error analysis has shown that symplecticity somehow improves the 
long-time behavior properties of the numerical solutions. For a symplectic method of 
order p, implemented with constant stepsize h, the following estimation reveals how 
the numerical solution y n may depart from the manifold H(y) = H(y ) of the phase 
space: 

H{y n )-H{y ) = O{nhe-^h p ), (4) 

where ho > is sufficiently small and h < /iq. Relation (jlj) implies that a linear drift 
of the energy with respect to time t = nh may arise. However, due to the presence 
of the exponential, such a drift will not appear as far as nh < ; this circumstance 
is often referred to by stating that symplectic methods conserve the energy function 
on exponentially long time intervals (see, for example, [331 Theorem 8.1]). This is 
clearly a surrogate of the definition of stability in that the "good behaviour" of the 
numerical solution is not extended on infinite time intervals. Even more alarming is 
the fact that if one wants to compute the numerical solution over very long times (as 
is done, for example, in the study of the stability of the Solar System), on the basis 
of (HI), he may be obliged to reduce the stepsize below a safety threshold, which is in 
contrast with the spirit of the long time simulation of dynamical systems where the 
use of very large stepsizes is one of the primary prerogatives]^ 

Where is the weakness of the approach? It is just in the above outlined words 
infinitesimal and infinite, which should be prohibited in Numerical Analysis. This 
discipline, in fact, has to deal with nonzero (greater than machine precision) and finite 
(bounded either by the patience of the operator or by the cost of energy) quantities. 
In other words, following this approach, the situation of the pre-Dahlquist era for 
dissipative problems has been recreated for conservative problems, in the sense that 
before Dahlquist there already existed methods, even with high order of convergence, 
that for h small enough would do the job (for example, the midpoint and Simpson's 
methods). 

7 For example, a symplectic Runge-Kutta method precisely conserves all quadratic invariants of 
the motion. 

Another constraint is that hy is assumed small enough. In our opinion, when possible, expressions 
like "for h small enough" should be avoided in Numerical Analysis: we like to believe that geometric 
integration has been devised just to eliminate such an expression. 
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Coming back to linear problems, not all symplectic methods provide a conservation 
of the Hamiltonian function even in this simpler case: the following example has been 
taken from [14, Example 8.2.1 on p. 189]. Consider the harmonic oscillator problem 
in Hamiltonian form 

Let h be the integration step and consider the numerical method defined by 

Since the continuous solution in h is e hJ yo, it is not difficult to deduce that the method 
is first order: e Jh — Mh = 0(h 2 ). The method is symplectic, since M^JM h = J, but 
fails to be conservative since, considering that H(q,p) = (q,p)(q,p) T /2, we have 

UlUl) ( l n+1 ) = (Qn,Pn)M^M h (l n )* (q n ,Pn) ( ) . 

V Pn+1 J V P n J V P n J 

The matrix is orthogonal only if the term h 2 is not present, according to the 
ICT hypothesis. But, unfortunately, this cannot be accepted since we want to use 
the method with finite values of h. 

The literature about symplectic integrators is quite wide since Numerical Ana- 
lysts, Physicists and Engineers have been working on them since more than 25 years. 
Consequently, this testifies their importance in the applications. However, other ap- 
proaches have been attempted, among which: 

- projection methods [21 EI] and numerical integrators on manifolds (see, e.g., 
PI Sect. IV.5.3]); 

- definitions of discrete counterparts of operators describing conservative vector 
fields, such as discrete gradients [23 HH], discrete divergence [2H], averaged 
vector fields [5U [32], discrete line integrals [6] (see the next section for an 
introduction to the latter methods). 

- generalizing the definition of symplecticity so as to include some nonlinearity 
in it (state dependent symplecticity [40]). 

Remark 1 It is worth mentioning that a Hamiltonian system may have other con- 
stants of motion, consequently the following question arises: suppose that we are able 
to devise methods conserving the energy, are the methods also able to preserve, for 
example, quadratic invariants? Few results have been presented so far regarding es- 
sentially methods in the Runge-Kutta class, most of them rather pessimistic. The 
paper FJ7|/ gives a first positive answer to this issue. 
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From the above discussion it turns out that the problem considered is a very 
difficult one, although very important in the applications. The only solid result 
obtained in all these years seems to be the one establishing that, in order that a 
method can preserve the Hamiltonian functions, it must be symmetric (see, e.g., 
[14J), although, of course, this condition in not sufficient. 

4.1 The simple new idea 

The novel approach (see, for example [6] and references therein) starts from a trivial 
observation. Let u(t) be any smooth curve [to,ti] —> M. 2m passing through y at time 
to- Then, we have: 

/T(w(ti)) - H(y ) = £ jH{u{t))dt. (7) 
Of course, choosing u{t) as the solution y(t) to ([3]) yields 

Hiyih)) - H(y ) = Pv r H{y{t))$£ldt = f\ T H{y(t))JVH{y(t))dt = V. 
Jt at j to 

The above result, which implies the conservation of the Hamiltonian function along 
the trajectory y(t) at times t and t±, has been obtained by exploiting ([3]) and the 
skew-symmetry of the matrix J. 

Is it possible to obtain a similar result for a curve u(t) not coincident with the 
unknown solution y(t) but nonetheless approximating it to a given order? 

Surprisingly enough, the answer is positive. Let {o; :/ (t)}°2 =0 be a set of linearly 
independent scalar functions defined on [io,£i] and {jjYjZo a set of unknown vectors. 
For simplicity, we assume that 

(1) the interval [to,^i] coincides with [0,1]; 

(2) the functions {ujj{t)}JL are orthogonal^] 

(3) the integrals of such functions are easily expressible as linear combination of 
themselves 

By setting u'{t) = YllZo 7« w i(^); we obtain 

u(t) = y Q + J^7i / Wi(r)dr = y + V^u^t). (8) 
i=0 Jo i=0 

9 Some further simplification can be obtained by choosing an orthonormal basis [6]. 
10 This is not a severe restriction, since elementary trigonometric functions and polynomials do 
have such a property. 
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The vectors jj, j = 0, . . . , r' are uniquely determined in terms of linear combinations 
of the 7i, % = 0, . . . ,r — 1, according to the specific relations mentioned in item (3) 
aboveo Setting y\ = u(l) yields 

H( yi ) - H(y ) = YV /' VH (u(t))dt 
i=o Jo 

We now choose the {7j} as the Fourier coefficients of the function JVH(w(t)), 
i.e., we pose 

H = Vi f cOi(t)JVH(cj(t))dt, i = 0,..., r-1, (9) 
Jo 

where {rji} are scalars that normalize the orthogonal functions {ui(t)} in order to 
make them an orthonormal basis for the space L2QO, 1]) of square-integrable functions. 
Considering the way the vectors {74} are involved in the definition of u/(t), we see 
that i 

Vi = n u*(t)d?j , 1 > 0. (10) 

Theorem 1 With the choice the Hamiltonian assumes the same values at yo and 
at y 1 . 

Proof. We have 

r-l 

H(y 1 )-H{y ) = Y,r H 

i=0 

due to the fact that J is skew- symmetric. □ 

We have then proved that the Hamiltonian can be preserved on curves different 
from the solution. A rescaling of the form [0,1] — > [to, to + h] will introduce the 
stepsize h and the iteration of the procedure will cover all the integration interval of 
interest with the result that on each interval of length h there are at least two points 
(the end points) where the Hamiltonian assumes the same value. 

The r relations ()9]) define the unknown vectors {7^} implicitly, since they appear 
as part of the integrands via the curve u(t) in ([8]). Thus, in the general case, ([9]) 
have to be regarded as nonlinear integral equations. Therefore, in order to obtain a 
concrete numerical method we need: 

(a) to substitute the integrals with discrete sums without introducing errors in the 
quadrature step; 

11 For example, for the shifted Legendre polynomials that we shall consider later, one has: r' = r. 



Ui(t)(VH(u(t))ydt 



Jo 



J T 


I 







Ui{t)VH{uj{t))dt 
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(b) to design an algorithm to solve the resulting, usually nonlinear, system providing 
the {7J; 



(c) to check that the vector y\ = 00 (ti), which will be assumed as an approximation, 
at time ti, of the true solution y(ti), has indeed the desired order of accuracy, 
say p: \\y(ti) — yi\ \ = 0(h p+1 ), for a given integer p > 1. 

More in general, step (a) could be replaced by the assumption that primitive func- 
tions for the integrands are available in closed form, no matter whether they can be 
expressed via standard quadrature formulae (see the examples in [8j Section 4]). In 
any event, the requirement (a) can be completely fulfilled if the functions {u>i(t)} and 
H are polynomials!^! 

Let v be the degree of H(y), and r — 1 be the degree of u'(t). Then the integrands 
in are polynomials as well, of degree at most /i = riv — 1) + r — 1 = vr — 1. We 
need then quadrature formulae having degree of precision greater than or equal to fi. 
Of course, it will be advantageous to chose them of Gaussian type. 

As we will see in Subsection I4.3[ the resulting methods fall in the class of block- 
BVMs and have been called Hamiltonian Boundary Value Methods (HBVMs)J^I 

To describe how the above three tasks can be accomplished, we start with the 
simpler case where H(y) is a quadratic function and therefore problem ([3]) is linear. 
In particular, we will consider hereafter the harmonic oscillator problem realizing that 
the obtained method is in fact the Gauss-Legendre method. The approach will be 
then generalized to nonlinear Hamiltonians thus leading to the new formulae. 

4.2 Application to the Harmonic Oscillator Problem 

The harmonic oscillator problem is described in Eq. ([5]); note that VH(y) = (q,p) T . 
Of course, we are tempted to take as set {(Ji(t)}°Z the trigonometric orthogonal 
systems, since this would provide the exact solution, even for small values of r. 

Let us instead take for {uji} the set of shifted Legendre polynomials {Pi(t)}, which 
is orthogonal in [0, 1]. They may be defined by the Rodrigues formula 



P (t) = l, P 1 (t) = 2t-1, P 2 (t) = 6t 2 - 6t+ 1, P 3 (t) = 20t 3 - 30t 2 + 12t - 1. 



What we here need about such polynomials are the following two properties (5ij 
denotes the Kronecker symbol): 

12 Since polynomials can approximate regular functions within any degree of accuracy, the present 
approach works fine in more general contexts. Furthermore, another important case where issue (a) 
is fulfilled is that of trigonometric functions over one period (see [2ZJ p. 155]). 

13 If desired, these methods also admit a Runge-Kutta formulation (see, for example, [THHUH])- 




i > 



(we also set P-i(t) 



0). The first few are 
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(Li) £ PiWPjftdt = ^Sij, i,j > 0; 

(La) /J P 3 (x)dx = 5^(P i+1 (t) - P^t) + VPoW), J > 0. 

Comparing Property (Li) with the normalization condition ffTUl) yields 77$ = 2z + 1. 
In this example we set r = 3 and, hence, 

u'(t) = P {t)j + P 1 {t) ll + P 2 {t)j 2 , 

U{t) = y +|(Pl(t)+ J Po(t))TO+K P 2(t)- J Po(t))7l + ^(L , 3(t)-Pl(t))T2. 

Relations ([9]) become 

71 = |^7o, 

72 = \Jlx = -|7o, 

7o = |J(37o-lJ7o + 6yo) = i ^ F (-59J + 2 • 54J)y , 
from which, by setting S = 4.542^592 (—59/ + 2 • 54 J), we obtain 

u'(t) = ((2-59P (t)-10P 2 (t))I + 60P 1 (t)J)Sy , 
w(t) = 2/o + [(60P 1 (0 + 59P (0- p 3W)/+10(P 2 (f)-Po(i))L]^o- 
and therefore the residual, 

R(t)=u'(t)-Ju(t) = P 3 (t)JSy , 

is zero (collocation) when £ is a root of P$(t)- Since the roots of P${t) are the abscissae 
of the Gauss collocation method of order six, from the uniqueness of the collocation 
polynomial we conclude that our approach applied to linear problems leads to these 
formulae 

Interestingly, this approach leads to completely new formulae if applied to general 
nonlinear Hamiltonian problems. 

4.3 Hamiltonian Boundary Value Methods (HBVMs) 

Relations (jUJ) have been retrieved by imposing the energy conservation property at 
the end points of the curve u(t), t 6 [to,t + h], and we are now assuming that 
H(y) is a polynomial of degree v so that the integrals may be exactly evaluated, for 
example, by means of a Gaussian quadrature formula of sufficiently high degree. As 
was pointed out in Subsection 14.11 they form a block nonlinear system of dimension 
r which, once solved, will provide the expression of the curve u(t) and, hence, of the 
numerical solution at time t\ = t + h, namely y\ = u(t Q + h). The resulting methods 

14 As a matter of fact, it is well-known that Gauss methods conserve quadratic energy functions. 
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have been called Hamiltonian Boundary Value Methods (HBVMs) because they are 
naturally and conveniently recast as block-BVMs (see [8j [7] for more details about 
their formulation and implementation). 

What about their order of convergence? Let us consider again the shifted Legendre 
polynomials. Substituting ([S]) into (jHJ), and setting c = (t — to)/h, yields 

r-1 

onto + ch) = y + h 

j=o 

and, on differentiating with respect to c, 

r-1 

a/ 

3=0 

In the above relations, we have set f(y) = JVff(y)@ 

Both (fill and ( fl2l) have an interesting interpretation. Consider the unknown 
solution y(to + ch) of 



;(to + c/i)=y + /iV(2j + l) f Pj(x)dx [ P 3 (r)f(u(t Q + Th))dr (11) 
j=o Jo Jo 

itiating with respect to c, 

r-1 „i 

/(t + ch) = Y t (2j + l)P j (c) / P,(r)/(o;(to + r/i))dr. (12) 



y'(t + ch) = f(y(t Q + ch)), ce [0,1], 
2/(*o) = 2/o, 



(13) 



or, equivalently, of its integral formulation 

y{t Q + ch) = y + h I f(y{t + rh))dr. (14) 



(15) 



Let us consider the Fourier series of f(y(t + rh)) for r G [0,1]: 

oo /-l 

/(y(to + cfc)) = + l)P,-(c) / P,-(r)/(y(t + rA))dr. 



3=0 

In terms of such expansion, (flBl and (f!4l read 



{oo „i 
y'(t + c/i) = V(2j + l)Pj{c) / Pj(r)f(y{t + rh))dr, c E [0, 1], 
3=0 J ° 
y{to) = yo, 

I 

°o /"C /"l 

y(t + ch)=y + hJ2(2j + l) P,(x)dx P 3 {r)f{y{t + Th))dT, 

A-n Jo JO 



(16) 



(17) 

respectively. Consequently (TTTT) and (112H are defined by simply truncating the series 
on the right hand side of ( f!6l) and ( TTTj) . Of course, in the event that series (115j) actually 
contains a finite number of terms, there will be no difference between ffl6l) - ffT7|) and 
(IT3"|) - (IT4"|) . provided r is large enough. 



How close are the two set of problems? The answer is obtained by means of the 
Alekseev formula [1], by using the following preliminary result. 
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In fact, the new methods also make sense for non Hamiltonian problems. 



16 



Lemma 1 Let g be a suitably regular function and h > 0. Then 



[ Pj{r)g(Th)dr = 0(hP), j>0. 
Jo 



Proof. Assume, for sake of simplicity, 



g(rh) = J2 g - J T 1 (rhr 

n=0 



to be the Taylor expansion of g. Then, for all j > 0, 

\ P J (r)g(rh)dr = J2 9 -^ 1 h n P 3 (r)r"dr = 
Jo n=0 n - Jo 

since Pj is orthogonal to polynomials of degree n < j. □ 
Let us now define the functions 

F h (c, y) = V(2j + 1)^(0) t P 3 {r)f{y{t Q + rh))dr 

3=0 J ° 

and 

R h (c,y) = J2(2j + l)P j (c) / P^fiyito + rh^dr. 

Jo 



(19) 



From Lemma [U after setting g(rh) = f(y(t + rh)), we deduce that 

oo 

R h (c,y) = ^2a j (h)P j (c), (20) 

j=r 

with = 0{ti), j = r, r + 1, . . . and, therefore, Rh{c, y) = 0{h r ). 

Moreover, for any given t G [to, to + h], we denote by u(s, i, y) the solution of (TT2T) 
at time s and with initial condition u(t) = y. 

Lemma 2 (Alekseev (1961)) Consider the two initial value problems with the same 
initial condition 

z' = tp(t,z), z (t )=y 0l 

y' = ( p(t,y) + ^P(t,y), y(to) = vo, 

and suppose that (p is continuously differentiable with respect to the second argument. 
Then the two solutions z{t) and y{t) satisfy the following relation: 

r* dz 

y{t) - z{t) = —{t, c, y(c)) ^(c, y{c)) dc. (21) 
Jt dy 

Proof. See, for example, 13^ Theorem 14-5, p. 96] or fjlj Theorem 7.5.1, p. 205]. □ 
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We can now state the following result. 



Theorem 2 Let a; (to + ch) and y(t + ch), c G [0, 1], be the solutions of [W\l and 
(ESP, respectively. Then, 

y(t + h)-Cu(t + h) = O(h 2r+1 ). 

Proof. In terms of the functions Fh and Rh, problems ( JT2l) and ( jTBT) read 

u/(t + ch) = F h (c,uj), y'(t + ch) = F h (c,y) + R h (c,y), 

respectively. From the Alekseev formula fl2T]) one then obtains, by virtue of Lemma [1] 
and ([20]), that 

y(t + h) - u(t + h) = 

= h -—(t + h,t + Th,y(t + Th))R h (T,y)dT 
Jo dy 

= h ^\J Pj^-Q^^o + h^o + ^yito + rh^dTja^h) 

= hO{h r )0{h r ) = 0{h 2r+1 ). □ 

As a direct consequence, we obtain the following result. 

Corollary 1 Let T = Nh be a fixed positive real number, N being an integer. Then, 
the approximation to the solution of problem 

y'(t) = f(y(t)), t E [t , to + T], y(t ) = y , 

by means of 

r-l „i 

L0'{t i _ 1 +ch)=J2{2j + l)P j {c) / Pj(T)f CE [0,1], l = l,...,N, 

where ti = t { _i + h, i = 1, . . . , N , and u(t ) = y , is 0{h 2r ) accurate. 
4.4 A numerical example 

In order to make clear the advantage of using the energy-preserving HBVMs f|T2l) over 
standard symplectic methods, we just mention that standard mesh selection strategies 
are not advisable for symplectic methods (see, e.g., [561 P-127], [121 p. 235], [331 
p. 303]), since a drift in the Hamiltonian and a quadratic error growth is experienced 
in such a case. The example that we consider below gives a hint that this is not 
the case for HBVMs, provided that the integral in ( fT2|) is exactly computed (at least, 
numerically which, as observed before, can be always achieved, for all suitably regular 
Hamiltonian functions) . 
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Figure 3: Hamiltonian error growth over 1000 periods with a variable-step solution 
of problem (|22J)- (|23|| . e = 0.99, with tolerance tol = 10™ 10 . 



Remark 2 As also observed in Section \4-£\ we mention that when the integrals in 



IjJM) are approximated by means of the Gauss-Legendre formula with r points, then 
the Gauss-Legendre method of order 2r is obtained /2J/. 



We now consider the Kepler problem (see, e.g., [331 P- 9]), with Hamiltonian 



H(lm ?2, Pi, p 2 ] T ) = UpI + pD - -r^=a, ( 22 ) 



that, when started at 

( l-e, 0, 0, ^) T , (23) 

has an elliptic periodic orbit of period 2n and eccentricity e G [0,1). Moreover, 
in such a case, the (constant) value of the Hamiltonian is H = — |. When e is 
close to 0, the problem is efficiently solved by using a constant stepsize. However, it 
becomes more and more difficult as e — > 1, so that a variable-step integration would 
be more appropriate in this case. In Figures [3] and H] we plot the error growth in the 
Hamiltonian and in the solution, respectively, over 1000 periods, in the case e = 0.99, 
when using a standard mesh selection strategy (e.g., like (1.1) in [331 P- 303]) with the 
(symplectic) Gauss-Legendre method of order 6 and the HBVM ( fl2|) with r = 3 (then, 
again of order 6), where the integral is approximated by means of a Gauss formula 
with k = 12 points, then having order 24, which is sufficient to obtain, in this case, 
a practical energy conservation (see [6], and references therein, for full details). The 
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Figure 4: Error growth over 1000 periods with a variable-step solution of problem 
fl22D-fl23D, e = 0.99, with tolerance tol = HT 10 . 

latter method is in general denoted by HBVM(A;, r) [6], so that in the present case we 
consider the HBVM(12,3) method] 16 ! The tolerance used is tol = 10 -10 . As one can 
see, the Gauss-Legendre formula produces a drift in the Hamiltonian and a quadratic 
error growth, whereas the HBVM(12,3) exhibits a negligible error in the Hamiltonian 
and a linear error growth. This confirms that the symplectic Gauss-Legendre method 
is more conveniently used with a constant stepsize, whereas the energy preserving 
HBVM can be profitably used with a standard mesh selection strategy. 

In Tabled] we also list the number of points required by the HBVM(12,3) method, 
with variable stepsize, for covering an increasing number of periods: as one can easily 
deduce from the listed data, 153 steps are required to cover each period. In order 
to make clear the improvement over the symplectic sixth-order Gauss method, it is 
enough to observe that, in order to obtain a comparable accuracy, this method would 
require approximately 2 • 10 5 (constant) steps for each period! 

5 Conclusions 

We have reported three problems in Numerical Analysis considered difficult for as 
long as half a century and which, in our opinion, have been eventually resolved by 
dramatically changing the traditional approach suggested by experience. We reiterate 

16 According to what observed in Remark [51 the HBVM(3,3) method coincides with the Gauss- 
Legendre method of order 6. 



20 



Table 1: Statistics for the variable step implementation of HBVM(12,3) on the Kepler 
problem (J22H23J, e = 0.99, with tolerance tol = 10~ 10 . 



periods 


error 


points 


i nn 






200 


1.36e-03 


30600 


300 


2.04e-03 


45900 


400 


2.72e-03 


61200 


500 


3.41e-03 


76500 


600 


4.10e-03 


91800 


700 


4.79e-03 


107100 


800 


5.48e-03 


122400 


900 


6.17e-03 


137700 


1000 


6.85e-03 


153000 



that this is certainly due to the fact that past studies were heavily biased by the 
concept of continuity, whose peculiarities the researchers have tried to import in the 
new context of Numerical Analysis, where such problems are, instead, of discrete 
nature: the two areas, i.e., the continuous one and the discrete one, are often not 
overlapping. 

Within this scenario, although it's certainly easier to follow the tracks already 
drawn and consolidated in the literature, exploring new routes sometimes allows to 
reach the goal more quickly and innovatively, as picturesquely described in the tale 
of the egg of Columbus. 
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